%{{{ setup
\documentclass[conference]{IEEEtran}
\usepackage{amsmath,url,color,soul, amssymb}
\usepackage{setspace, graphicx}
\usepackage[x11names,rgb]{xcolor}
\usepackage{algorithm, algorithmic}
\usepackage{paralist}
\usepackage{tikz}
\usetikzlibrary{snakes}
\usetikzlibrary{arrows}
\usetikzlibrary{shapes}
\usetikzlibrary{backgrounds}
\usetikzlibrary{patterns}
\usepackage{pgfplots}
\usepackage{balance}
\usepackage{fancyhdr}
\usepackage[cm]{fullpage}
\usepackage[scriptsize,tight]{subfigure}
\usepackage{url}
\usepackage[
pdftitle={title},
pdfcreator={pdftex},
pdfsubject={SCXY 2010},
hyperindex = {true},
colorlinks = {true},
linkcolor = {blue},
citecolor = {blue}
]{hyperref} 
\graphicspath{{figs/}}

\input macros.tex

\begin{document}

\title{Parallel Fast Gauss Transform} 

\author{\IEEEauthorblockN{Rahul S. Sampath}
\IEEEauthorblockA{Oak Ridge National Laboratory\\
       Oak Ridge, TN 37831\\
       Email: sampathrs@ornl.gov} 

\and
\IEEEauthorblockN{Hari Sundar}
\IEEEauthorblockA{Siemens Corporate Research \\
       Princeton, NJ 08540 \\
       Email: hari.sundar@siemens.com}

\and
\IEEEauthorblockN{Shravan K. Veerapaneni}
\IEEEauthorblockA{New York University \\
       New York, NY 10012 \\
       Email: shravan@cims.nyu.edu}
}

\date{}

\maketitle

\thispagestyle{fancy}
\lhead{}
\rhead{}
\chead{}
\lfoot{\footnotesize{\copyright \hspace{0.1cm} 2010 IEEE \hspace{0.3cm} Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted 
component of this work in other works must be obtained from the IEEE. SC10 November 2010, New Orleans, Louisiana, USA 978-1-4244-7558-2/10/\$26.00}}
\rfoot{}
\cfoot{}
\renewcommand{\headrulewidth}{0pt}
\renewcommand{\footrulewidth}{0pt}

\begin{abstract}
\input abstract.tex
\end{abstract}

\section{Introduction}  
\label{s:intro}
\input intro.tex

\section{Overview of FGT for uniform distributions} 
\label{sc:fgt}
\input fgt.tex

\section{Translation scheme} 
\label{sc:sweep}
\input sweep.tex

\section{FGT for non-uniform distributions} 
\label{sc:nonuniform}
\input nonuniform.tex

\input parallelNonUniform.tex

\section{Results}
\label{sc:results}
\input results.tex

\section{Conclusions}
\label{sc:conclusions}
We have presented fast adaptive parallel algorithms to compute discrete spatial transforms with Gaussian-type kernels.
We introduced a novel translation scheme that leads to significant speed-ups for non-uniform distributions. We described a
 new paradigm that uses octrees to compute these transforms on non-uniform distributions with optimal complexity. 
 We demonstrated the good scalability of our implementation for both uniform as well as non-uniform distributions. The
 code developed as part of this paper will be made publicly available under the terms of the GNU general public license (GPL). 
 
There are a few more accelerations that we can include in our implementation. Most important of these is the Hermite to 
plane-wave conversion scheme of \cite{fggt} which is known to yield significant speed-ups. Another important direction
 is overlapping communication with computation, which can be done for almost all of the communication steps. For example, 
 the plane-wave expansions can be formed at the inter-processor boundaries first and while they are being communicated 
 across processors, plane-wave expansions can be formed at the interior boxes. In addition, we plan to port
 the local operations to OpenCL \cite{opencl} to enable the code to run on heterogeneous platforms including multicore CPUs and GPUs.

Apart from the parallel algorithms described in this paper, fast solvers for linear parabolic PDEs 
require convolutions with certain non-standard kernels \cite{li09, skv09}. We are currently investigating fast 
parallel algorithms for those kernels. Together, we believe, they will provide promising alternatives 
for applications in high-performance computing. 

\section*{Acknowledgments}
\input acknowledge.tex

\balance

\bibliographystyle{IEEEtran}
\bibliography{sc10}

\end{document} 
